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Software  quality  assurance  is  a  protocol  in¬ 
volving  two  parties:  the  vendor  and  the  customer. 
As  shown  in  Figure  1  below,  the  vendor  produces  a 
software  system  which  he  submits  to  the  customer 
for  purchase.  The  relationship  between  the  vendor 
and  the  customer  is  left  unspecified  in  this  model, 
but  the  intent  should  be  clear;  the  following  table 
gives  three  common  interpretations. 


i/or 
.  al 


or  not  to  accept  the  software. 

Mutation  analysis  [1 ,2,3,4]  is  an  evaluation 
technique.  I  will  concern  myself  only  with  the 
correctness  of  the  software;  therefore,  the  evalu¬ 
ation  that  is  returned  is  an  indication  of  how  well 
the  software  has  been  tested  by  the  vendor,  and  the 
evidence  used  in  the  evaluation  is  a  set  of  test 
cases. 
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With  any  of  these  interpretations  in  mind,  the 


protocol  is  easy  to  follow;  the  vendor  presents  to 
an  impartial  evaluator  his  software  and  evidence  E 
which  purports  to  show  that  the  software  performs 
as  advertised  to  the  customer.  The  evidence  should 
be  objective  and  the  evaluation  should  be  reprodu¬ 
cible  by  both  the  vendor  and  the  customer.  The 
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evaluation  of  the  evidence  is  then  submitted  along 
with  the  software  and  the  evidence  to  the  customer 
who  then  makes  the  (subjective)  decision  of  whether 


Figure  1. 
The  QA  Protocol 
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The  best  evaluation  which  can  be  given  to  the 
customer  is  an  objective  probability  that  the  pro¬ 
gram  is  correct,  but  there  are  compelling  reasons 
for  believing  that  such  an  approach  is  not  feasible 
[5].  A  good  alternative  approach  is  to  let  the 
evaluation  represent  a  level  of  confidence  in  the 
adequacy  of  the  test  cases.  Test  cases  are  ade¬ 
quate  if  they  demonstrate  the  correctness  of  the 
programs;  in  other  words,  in  order  to  be  adequate, 
the  test  cases  must  be  so  exhaustive  that,  not 
only  must  be  proffered  program  run  correctly  on  the 
test  cases,  but  every  incorrect  program  must  run 
incorrectly.  Unfortunately,  this  notion  of  ade¬ 
quacy  is  much  too  strong  --  producing  adequate 
test  cases  is  impossible  except  for  very  simple 
programs.  The  fault  is  not  with  notion  of  adequacy; 
it  is  with  the  notion  that  test  cases  should  be  in¬ 
sensitive  to  other  specialized  information  about 
the  vendor.  To  be  useful  as  evidence  of  a  pro¬ 
gram's  correctness,  test  data  need  only  distinguish 
the  program  from  finitely  many  alternatives  —  the 
alternatives  which  correspond  to  the  most  likely 
errors  to  be  introduced  by  the  vendor.  With  this 
motivation,  the  evaluation  resulting  from  a  muta¬ 
tion  analysis  of  a  program  P  and  its  test  data  T 
(the  mutation  score,  denoted  ms(F,T)),  can  be  pre¬ 
cisely  defined.  A  set  of  mutants  of  a  program  P, 
M(P),  is  a  finite  subset  of  the  set  of  all  programs 
written  in  the  language  of  P.  The  set 
EM(P)  c  M(P)  consists  0f  those  programs  in  M(P) 
which  are  (functionally)  equivalent  to  P.  For  a 
set  of  test  cases  T,  OM(P.T)  is  the  set  of  programs 
in  M(P)  which  give  results  differing  from  P  on  at 
least  one  point  in  T.  A  mutation  score  for  P,T  is 


defined  as  follows: 

ms(P.T)  = - IM.PJH _ 

|H(P)|  -  | EM(P) J 

The  central  problem  for  quality  assurance  is 
then  to  choose  a  function  M  so  that 

1.  ms(P,T)  «  1  exactly  when  T  demonstrates 
the  correctness  of  P  with  a  high  level 
of  confidence,  and 

2.  ms(P,T)  is  a  relatively  cheap  measure  to 
compute. 

The  advantage  of  such  a  measure  is  that  it 
satisfies  the  basic  requirements  of  the  QA  protocol. 
Since  the  sets  M(P)  and  EM(P)  are  fixed  beforehand, 
the  measure  is  objective  and  since  DM(P,T)  depends 
only  on  being  able  to  execute  P-1  ike  programs,  the 
measure  is  reproducible.  Condition  1  insures  that 
the  results  of  the  evaluation  are  reliable,  and 
Condition  2  provides  that  the  evaluation  process 
will  not  be  excessively  burdensome  to  apply. 

In  a  series  of  automated  mutation  analysis 
systems,  the  concepts  of  choosing  M(P)  by  making 
"simple"  mutations  has  been  explored  (see,  e.g., 
[3]).  The  underlying  assumption  for  such  a  choice 
has  been  that  these  mutations  correspond  to  the 
errors  most  likely  to  be  made  in  producing  P.  How 
good  is  test  data  T  (the  evidence)  such  that 
ms(P,T)  =  1?  An  approach  to  this  problem  has  been 
formulated  in  a  recent  thesis  at  Georgia  Tech  [6]. 

One  measure  of  how  good  the  ms  measure  might  be 
is  how  many  "complex"  mutants  it  leaves  "unex¬ 
plained".  An  experiment  to  investigate  this  effect 
for  Cobol  programs  is  discussed  in  [6].  The  pro¬ 
grams  P1-P6  are  representative  Cobol  programs  in 
the  100-700  line  range.  To  test  the  measure 
ms(P1,T)  one  first  derives  test  data  Ti  so  that 
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ms ( P i , T i )  =  1.  The  question  then  becomes  how  many 
complex  mutants  give  the  same  results  as  Pi  on  Ti. 
Any  such  complex  mutants  are  said  to  be  uncoupled. 

A  key  point  to  be  settled  in  such  an  experiment  is 
what  is  meant  by  "complex."  The  experimental  re¬ 
sults  shown  in  Figure  2  use  complex  mutants  result¬ 
ing  from  random  pairs  of  simple  mutants,  while  the 
results  shown  in  Figure  3  use  complex  mutants  re¬ 
sulting  from  random  correlated  pairs  of  simple  mu¬ 
tants.  A  detailed  justification  for  using  random 
pairs  and  correlated  pairs  in  two  separate  experi¬ 
ments  is  given  in  [3].  The  quantity  "survives"  de¬ 
notes  the  number  of  complex  mutants  that  are  left 
uncoupled  by  Ti,  and  "not  equivalent"  denotes  the 
number  of  uncoupled  mutants  that  are  not  function¬ 
ally  equivalent  to  Pi  --  an  important  quantity 
since  the  functionally  equivalent  mutants  cannot 
be  distinguished  by  any  test  data  and  therefore  do 
not  effect  the  strength  of  the  ms(Pi,Ti)  measure. 
The  diagrams  in  Figures  2  and  3  show  the  95%  con¬ 
fidence  intervals  on  the  quantity  (z*100,000), 
where  z  is  the  probability  that  a  randomly  selected 
pair  of  simple  mutants  (correlated  mutants)  is  un¬ 
coupled  for  test  data  Ti.  The  size  of  the  samples 
(50,000  for  Figure  1  and  10,000  for  Figure  2)  re¬ 
flect  the  relative  sparseness  of  the  set  of  corre¬ 
lated  simple  mutants. 

Program  Survives  Not  Equiv.  95%  c.i. _ 
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Figure  2. 

50,000  Random  Pairs  of  Mutants  for  each  Pi 
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Figure  3. 

10,000  Correlated  Pairs  for  Each  Pi. 

Essentially  the  same  experiment  has  been  per¬ 
formed  on  other  categories  of  complex  mutants  with 
even  more  dramatic  results:  there  are  no  uncoupled 
mutants!  The  rare  uncoupled  mutants  in  these  ex¬ 
periments  seem  to  fal'.  into  three  classifications, 
all  of  them  tied  to  the  way  In  which  loops  and  de¬ 
cisions  are  tested.  Since  the  number  of  execution 
paths  in  a  looping  program  can  be  Infinite  and  in 
a  loop-free  program  can  be  exponential  in  the  num¬ 
ber  of  program  statements,  there  is  no  computation¬ 
ally  feasible  method  of  testing  all  program  paths 
—  the  uncoupled  mutants  in  these  experiments  seem 
to  be  due  to  this  fact.  Surprisingly,  the  results 
of  [6]  also  seem  to  indicate  that  the  existence  of 
uncoupled  mutants  is  not  related  to  the  branching 
complexity  of  the  program. 

Condition  2  above  asks  that  the  calculation  of 
ms(P,T)  be  efficient.  As  is  discussed  in  [3],  the 
complexity  of  the  calculation  —  when  expressed  as 
|M(P)|  —  tends  to  be  roughly  quadratic  in  the  size 
of  P.  Since  there  are  several  heuristics  available 
for  speeding  up  the  calculation  of  ms(P,T)  from 
M(P) ,  this  is  quite  acceptable  for  programs  in  the 
100-5000  line  range.  In  fact,  medium  scale  pro¬ 
grams  in  the  range  of  1000  lines  have  been  tested 
on  a  fully  loaded  DEC-20  at  5-10  times  coding  rates 
for  similar  production  codes. 
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For  evaluations  of  very  large  monolithic  pro¬ 
grams,  it  may  be  necessary  to  sample  a  small  frac¬ 
tion  of  the  available  mutants  in  M(P)  and  use  the 
results  of  the  sample  to  infer  the  strength  of  the 
test  data.  In  an  experiment  also  reported  in  [6], 
the  programs  P1-P6  were  tested  using  the  sampling 
strategy  and  the  resulting  tests  were  then  evalu¬ 
ated  using  conventional  mutation  analysis.  Approx¬ 
imately  10%  of  the  mutants  in  M(P)  were  selected  at 
random  for  elimination.  The  resulting  test  data 
T1.....T6  was  used  to  calculate  ms(Pi,Ti),  with 
the  following  results: 

ms(Tl.Pl)  =  1 
ms(T2,P2)  =  1 
ms(T3,P3)  =  .99 
ms(T4,P4)  =  .99 
ms(PS,T5)  =  .99 
ms(P6,T6)  =  .99. 

This  procedure  is  clearly  valuable  in  reduc¬ 
ing  the  cost  of  mutation  analysis,  and  as  these 
results  demonstrate,  sacrifices  very  little  in  the 
way  of  evaluation. 

Basic  investigations  into  the  effectiveness 
of  mutation  analysis  as  a  tool  for  quality  assur¬ 
ance  continue.  As  the  recent  experimental  [4,6] 
and  theoretical  [4]  results  suggest,  however, 
mutation  analysis  can  be  a  very  attractive  tool 
for  practical  QA  as  well  as  for  program  testing. 
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